The MSSP NCNM Team

The MSSP NCNM Team

The MSSP NCNM Presentation - Professor: Haviland Wright

  • Group 1: Jimmy Ye, Jinyu Li, Yuli Jin

  • Group 2: Daniel Xu, Kayla Choi, Nancy Shen

  • Group 3: Mi Zhang, Boyu Chen, Shicong Wang, Biyao Zhang

  • Group 4: Keliang Xu, Yingjie Wang, James He, Ruining Jia

Our Partners

  • Aidan O’Hara:has been preparing for this project since late July

  • Alison Turner: a Community Development Planner at NCNMEDD and recent MSSP graduate

  • Allen Razdow:founder and president of True Engineering Technology, LLC and originator of Truenumbers

Project Background

  • The current developing situation in NCNM:

They are at a turning point right now. Historically NCNM has had few resources in order to acquire grants and successfully administer ten in order to complete projects. With pandemic related dollars flowing to the region they finally have capital to spend on some of our most needed projects. Broadband access and issues of outmigration are the two biggest issues exiting.

  • What approaches are used for collecting data:

Census, mostly. They don’t collect a lot of data from their office but would be happy if there were recommendations on the gaps in census data or the insufficiencies that they’re seeing by the census as a region.

  • What variables will we use for this project? On what scales are they measured:

Demographics(categorical).
Income(numerical), range: 0-1,000,000,000,000 (unsure if this is the maximum) gross receipts tased.
Unemployment rate(numerical).
GDP(numerical).
Number of business establishments(numerical).

Project focus

The ED-900 form must accompany all EDA grant applications. Here’s an example:

Ultimate Goal:

  • TrueNumbers database that can be accessed by NCNMEDD and local government staff to assist with grant applications.

  • An analysis of the data from the region - we have fairly low census response rates which could lead to data quality issues - if data quality issues exist, we need to come up with supplemental sources of data to improve inferences made about the region.

Focusing on for this semester:

  • TrueNumbers

  • Dive into what the census is, why it’s important, and how low response rates may pose an issue.

Our approach

Truenumbers

Truenumbers continue..

Truenumbers continue…

Data

Data Source

Our data is from ACS(American Community Survey).

The ACS is a large demographic survey collected throughout the year using mailed questionnaires, telephone interviews, and visits from Census Bureau field representatives to about 3.5 million household addresses annually.

Data availability for geographic areas differs by population size:

1-year estimates are available for areas of population 65,000 or more, while 5-year estimates are available for all areas.

Data

Parameter interpretation

Estimates are produced for - demographic characteristics (sex,age,);
- social characteristics (school enrollment, educational attainment);
- economic characteristics (employment status, commuting to work);
- housing characteristics (housing occupancy, units in structure).

In this presentation, we basically focused on

Data

What did we do?
Design functions to clean the data in order to let everyone easily use ‘filter&select’ to tackle the data:

Modify Dataset: - clean_tag: Modify the columns of subjects and tags to more simple columns
- get_county: Modify the column of county to more simple column

Function for tackling the data for further analysis: - get_county_data: This function is designed to get the data from different county

You still need some steps to get the data: Connect BU VPN: vpn.bu.edu
Run the following code and wait for seconds:
- source(file=‘data_clean/mexico_screen_function.R’)
- data<-get_county_data()
- data %>% view()

EDA Appetizer

EDA Appetizer